Skip to content

Conversation

@wxtim
Copy link
Member

@wxtim wxtim commented Oct 17, 2025

Closes #6053

Datapoint is to be switched off from the first of December.

The data is available from the Met Office via the Amazon Sustainability Data Initiative. Amazon link.

Annoyingly the metadata there seems quite limited, so I applied to a bunch of contacts in Obs R&D who insisted that the data has the following properties:

image

Which raises 2 problems:

  1. The image is in a different projection to previous input data (Transverse rather than obverse Mercator projection)
  2. The domain doesn't seem quite right with the values given (though these could just be a side effect of (1)

Rather than re-jigging the mathematics I've made something vaguely plausible by fiddling with domain values. Hopefully this creates a product good enough for the training purpose for which it is built.

Finally, I have fixed a bug I introduced when I added the SYNOP collecting routine - these wind observations are in meteorological convention (wind is blowing from), but we need where the wind is going to, so all wind directions were 180° off!

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@oliver-sanders oliver-sanders changed the title Killdatapoint tutorial: replace deprecated DataPoint API Oct 17, 2025
@oliver-sanders oliver-sanders marked this pull request as draft October 17, 2025 09:22
@wxtim wxtim force-pushed the killdatapoint branch 3 times, most recently from d109fc3 to f684f1d Compare October 17, 2025 10:05
@oliver-sanders
Copy link
Member

@wxtim, FYI, this will need to go into a bugfix release.

@wxtim wxtim marked this pull request as ready for review October 21, 2025 12:36
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is on the wrong branch.

Some small comments so far:

@oliver-sanders
Copy link
Member

It looks like we have multiple options for reading HDF5 files. h5py, is one, but Pandas and xarray are apparently alternatives.

Pandas might be an appealing option as widely used and is already an optional dependency.

Can we have a quick review of the options to work out which is the most lightweight, easiest to support, least likely to cause problems, etc.

@wxtim
Copy link
Member Author

wxtim commented Oct 23, 2025

It looks like we have multiple options for reading HDF5 files. h5py, is one, but Pandas and xarray are apparently alternatives.

Pandas might be an appealing option as widely used and is already an optional dependency.

Can we have a quick review of the options to work out which is the most lightweight, easiest to support, least likely to cause problems, etc.

I think that they may all be the same: The documentation certainly suggests that pandas.read_hdf uses pytables, whose docs suggest that uses h5py. Looking at the source of pandas shows h5py is an optional depency. I think it's going to be h5py whatever we choose.

@wxtim wxtim requested a review from oliver-sanders October 29, 2025 10:22
@oliver-sanders
Copy link
Member

FYI: There's a wrap-around problem with rainfall leaking out of the west of the domain into the east:

(possible that this was a pre-existing issue?)

Screenshot from 2025-10-30 12-48-25

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 30, 2025

The get-rainfall step is really slow ~1:40. After a bit of fiddling, I managed to get that down to ~0:30 which is reasonable.

Note: It uses ~25% CPU (but negligible RAM) on my box.

wxtim#75

@wxtim
Copy link
Member Author

wxtim commented Oct 30, 2025

FYI: There's a wrap-around problem with rainfall leaking out of the west of the domain into the east:

Not me guv.

image

I'm going to leave this. It's probably not important enough to matter.

@wxtim wxtim added this to the 7.8.x milestone Nov 5, 2025
@wxtim wxtim added doc Documentation dependencies labels Nov 5, 2025
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Tested as working.

@wxtim wxtim changed the base branch from master to 8.6.x November 13, 2025 13:17
@wxtim wxtim requested a review from oliver-sanders November 13, 2025 13:20
Copy link
Member

@oliver-sanders oliver-sanders Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to delete some commits post base change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxtim, soz, this branch still has 8.7.x commits post rebase.

@oliver-sanders
Copy link
Member

Looking good, assigned @ChrisPaulBennett for a quick skim.

Note, we will need followup PRs in:

  • cylc-doc
  • metomi-rose

@wxtim wxtim force-pushed the killdatapoint branch 2 times, most recently from e984c2e to 1351a9d Compare November 14, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies doc Documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tutorial workflow: move away from datapoint

2 participants